-
Notifications
You must be signed in to change notification settings - Fork 8
Update dependency transformers to v4.53.0 [SECURITY] #30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
renovate
wants to merge
1
commit into
main
Choose a base branch
from
renovate/pypi-transformers-vulnerability
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
thepetk
approved these changes
Mar 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
54c16f5 to
f88c921
Compare
f88c921 to
48cb960
Compare
48cb960 to
aef2157
Compare
aef2157 to
4c8e625
Compare
4c8e625 to
cc82e9a
Compare
cc82e9a to
020c12d
Compare
020c12d to
1ad7b15
Compare
1ad7b15 to
465bc72
Compare
465bc72 to
6e5b47f
Compare
6e5b47f to
b1be93e
Compare
b1be93e to
3ee72ed
Compare
3ee72ed to
d0b788a
Compare
d0b788a to
ebdfd8f
Compare
e6e8f20 to
51d73f3
Compare
5d9d06a to
c079476
Compare
c079476 to
6e66bb7
Compare
6e66bb7 to
c02d11b
Compare
c02d11b to
4d1fca8
Compare
4d1fca8 to
737b982
Compare
737b982 to
536512c
Compare
536512c to
4e2d0da
Compare
4e2d0da to
ec62d2a
Compare
ec62d2a to
8f7bf47
Compare
8f7bf47 to
412c924
Compare
412c924 to
27dec08
Compare
27dec08 to
64c54ca
Compare
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
64c54ca to
3b05fbf
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
~=4.40.2->~=4.53.0==4.45.2->==4.53.0==4.41.2->==4.53.0Transformers Regular Expression Denial of Service (ReDoS) vulnerability
CVE-2024-12720 / GHSA-6rvg-6v2m-4j46
More information
Details
A Regular Expression Denial of Service (ReDoS) vulnerability was identified in the huggingface/transformers library, specifically in the file tokenization_nougat_fast.py. The vulnerability occurs in the post_process_single() function, where a regular expression processes specially crafted input. The issue stems from the regex exhibiting exponential time complexity under certain conditions, leading to excessive backtracking. This can result in significantly high CPU usage and potential application downtime, effectively creating a Denial of Service (DoS) scenario. The affected version is v4.46.3.
Severity
CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:LReferences
This data is provided by OSV and the GitHub Advisory Database (CC-BY 4.0).
Deserialization of Untrusted Data in Hugging Face Transformers
CVE-2024-11392 / GHSA-qxrp-vhvm-j765 / PYSEC-2024-227
More information
Details
Hugging Face Transformers MobileViTV2 Deserialization of Untrusted Data Remote Code Execution Vulnerability. This vulnerability allows remote attackers to execute arbitrary code on affected installations of Hugging Face Transformers. User interaction is required to exploit this vulnerability in that the target must visit a malicious page or open a malicious file.
The specific flaw exists within the handling of configuration files. The issue results from the lack of proper validation of user-supplied data, which can result in deserialization of untrusted data. An attacker can leverage this vulnerability to execute code in the context of the current user. Was ZDI-CAN-24322.
Severity
CVSS:3.0/AV:N/AC:H/PR:N/UI:R/S:U/C:H/I:H/A:HReferences
This data is provided by OSV and the GitHub Advisory Database (CC-BY 4.0).
CVE-2024-11393 / GHSA-wrfc-pvp9-mr9g / PYSEC-2024-228
More information
Details
Hugging Face Transformers MaskFormer Model Deserialization of Untrusted Data Remote Code Execution Vulnerability. This vulnerability allows remote attackers to execute arbitrary code on affected installations of Hugging Face Transformers. User interaction is required to exploit this vulnerability in that the target must visit a malicious page or open a malicious file.
The specific flaw exists within the parsing of model files. The issue results from the lack of proper validation of user-supplied data, which can result in deserialization of untrusted data. An attacker can leverage this vulnerability to execute code in the context of the current user. Was ZDI-CAN-25191.
Severity
CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:HReferences
This data is provided by OSV and the PyPI Advisory Database (CC-BY 4.0).
CVE-2024-11394 / GHSA-hxxf-235m-72v3 / PYSEC-2024-229
More information
Details
Hugging Face Transformers Trax Model Deserialization of Untrusted Data Remote Code Execution Vulnerability. This vulnerability allows remote attackers to execute arbitrary code on affected installations of Hugging Face Transformers. User interaction is required to exploit this vulnerability in that the target must visit a malicious page or open a malicious file.
The specific flaw exists within the handling of model files. The issue results from the lack of proper validation of user-supplied data, which can result in deserialization of untrusted data. An attacker can leverage this vulnerability to execute code in the context of the current user. Was ZDI-CAN-25012.
Severity
CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:HReferences
This data is provided by OSV and the PyPI Advisory Database (CC-BY 4.0).
CVE-2024-11392 / GHSA-qxrp-vhvm-j765 / PYSEC-2024-227
More information
Details
Hugging Face Transformers MobileViTV2 Deserialization of Untrusted Data Remote Code Execution Vulnerability. This vulnerability allows remote attackers to execute arbitrary code on affected installations of Hugging Face Transformers. User interaction is required to exploit this vulnerability in that the target must visit a malicious page or open a malicious file.
The specific flaw exists within the handling of configuration files. The issue results from the lack of proper validation of user-supplied data, which can result in deserialization of untrusted data. An attacker can leverage this vulnerability to execute code in the context of the current user. Was ZDI-CAN-24322.
Severity
CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:HReferences
This data is provided by OSV and the PyPI Advisory Database (CC-BY 4.0).
Deserialization of Untrusted Data in Hugging Face Transformers
CVE-2024-11394 / GHSA-hxxf-235m-72v3 / PYSEC-2024-229
More information
Details
Hugging Face Transformers Trax Model Deserialization of Untrusted Data Remote Code Execution Vulnerability. This vulnerability allows remote attackers to execute arbitrary code on affected installations of Hugging Face Transformers. User interaction is required to exploit this vulnerability in that the target must visit a malicious page or open a malicious file.
The specific flaw exists within the handling of model files. The issue results from the lack of proper validation of user-supplied data, which can result in deserialization of untrusted data. An attacker can leverage this vulnerability to execute code in the context of the current user. Was ZDI-CAN-25012.
Severity
CVSS:3.0/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:HReferences
This data is provided by OSV and the GitHub Advisory Database (CC-BY 4.0).
Deserialization of Untrusted Data in Hugging Face Transformers
CVE-2024-11393 / GHSA-wrfc-pvp9-mr9g / PYSEC-2024-228
More information
Details
Hugging Face Transformers MaskFormer Model Deserialization of Untrusted Data Remote Code Execution Vulnerability. This vulnerability allows remote attackers to execute arbitrary code on affected installations of Hugging Face Transformers. User interaction is required to exploit this vulnerability in that the target must visit a malicious page or open a malicious file.
The specific flaw exists within the parsing of model files. The issue results from the lack of proper validation of user-supplied data, which can result in deserialization of untrusted data. An attacker can leverage this vulnerability to execute code in the context of the current user. Was ZDI-CAN-25191.
Severity
CVSS:3.0/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:HReferences
This data is provided by OSV and the GitHub Advisory Database (CC-BY 4.0).
CVE-2025-2099 / GHSA-qq3j-4f4f-9583 / PYSEC-2025-40
More information
Details
A vulnerability in the
preprocess_string()function of thetransformers.testing_utilsmodule in huggingface/transformers version v4.48.3 allows for a Regular Expression Denial of Service (ReDoS) attack. The regular expression used to process code blocks in docstrings contains nested quantifiers, leading to exponential backtracking when processing input with a large number of newline characters. An attacker can exploit this by providing a specially crafted payload, causing high CPU usage and potential application downtime, effectively resulting in a Denial of Service (DoS) scenario.Severity
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:HReferences
This data is provided by OSV and the PyPI Advisory Database (CC-BY 4.0).
Transformers Regular Expression Denial of Service (ReDoS) vulnerability
CVE-2025-1194 / GHSA-fpwr-67px-3qhx
More information
Details
A Regular Expression Denial of Service (ReDoS) vulnerability was identified in the huggingface/transformers library, specifically in the file
tokenization_gpt_neox_japanese.pyof the GPT-NeoX-Japanese model. The vulnerability occurs in the SubWordJapaneseTokenizer class, where regular expressions process specially crafted inputs. The issue stems from a regex exhibiting exponential complexity under certain conditions, leading to excessive backtracking. This can result in high CPU usage and potential application downtime, effectively creating a Denial of Service (DoS) scenario. The affected version is v4.48.1 (latest).Severity
CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:N/I:N/A:LReferences
This data is provided by OSV and the GitHub Advisory Database (CC-BY 4.0).
Hugging Face Transformers Regular Expression Denial of Service
CVE-2025-2099 / GHSA-qq3j-4f4f-9583 / PYSEC-2025-40
More information
Details
A Regular Expression Denial of Service (ReDoS) exists in the
preprocess_string()function of thetransformers.testing_utilsmodule. In versions before 4.50.0, the regex used to process code blocks in docstrings contains nested quantifiers that can trigger catastrophic backtracking when given inputs with many newline characters. An attacker who can supply such input topreprocess_string()(or code paths that call it) can force excessive CPU usage and degrade availability.Fix: released in 4.50.0, which rewrites the regex to avoid the inefficient pattern. ([GitHub][1])
< 4.50.04.50.0Severity
CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:LReferences
This data is provided by OSV and the GitHub Advisory Database (CC-BY 4.0).
Transformers's ReDoS vulnerability in get_configuration_file can lead to catastrophic backtracking
CVE-2025-3263 / GHSA-q2wp-rjmx-x6x9
More information
Details
A Regular Expression Denial of Service (ReDoS) vulnerability was discovered in the Hugging Face Transformers library, specifically in the
get_configuration_file()function within thetransformers.configuration_utilsmodule. The affected version is 4.49.0, and the issue is resolved in version 4.51.0. The vulnerability arises from the use of a regular expression patternconfig\.(.*)\.jsonthat can be exploited to cause excessive CPU consumption through crafted input strings, leading to catastrophic backtracking. This can result in model serving disruption, resource exhaustion, and increased latency in applications using the library.Severity
CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:LReferences
This data is provided by OSV and the GitHub Advisory Database (CC-BY 4.0).
Transformers vulnerable to ReDoS attack through its get_imports() function
CVE-2025-3264 / GHSA-jjph-296x-mrcr
More information
Details
A Regular Expression Denial of Service (ReDoS) vulnerability was discovered in the Hugging Face Transformers library, specifically in the
get_imports()function withindynamic_module_utils.py. This vulnerability affects versions 4.49.0 and is fixed in version 4.51.0. The issue arises from a regular expression pattern\s*try\s*:.*?except.*?:used to filter out try/except blocks from Python code, which can be exploited to cause excessive CPU consumption through crafted input strings due to catastrophic backtracking. This vulnerability can lead to remote code loading disruption, resource exhaustion in model serving, supply chain attack vectors, and development pipeline disruption.Severity
CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:LReferences
This data is provided by OSV and the GitHub Advisory Database (CC-BY 4.0).
Transformers's Improper Input Validation vulnerability can be exploited through username injection
CVE-2025-3777 / GHSA-phhr-52qp-3mj4
More information
Details
Hugging Face Transformers versions up to 4.49.0 are affected by an improper input validation vulnerability in the
image_utils.pyfile. The vulnerability arises from insecure URL validation using thestartswith()method, which can be bypassed through URL username injection. This allows attackers to craft URLs that appear to be from YouTube but resolve to malicious domains, potentially leading to phishing attacks, malware distribution, or data exfiltration. The issue is fixed in version 4.52.1.Severity
CVSS:3.0/AV:N/AC:L/PR:L/UI:R/S:U/C:L/I:N/A:NReferences
This data is provided by OSV and the GitHub Advisory Database (CC-BY 4.0).
Transformers is vulnerable to ReDoS attack through its DonutProcessor class
CVE-2025-3933 / GHSA-37mw-44qp-f5jm
More information
Details
A Regular Expression Denial of Service (ReDoS) vulnerability was discovered in the Hugging Face Transformers library, specifically within the DonutProcessor class's
token2json()method. This vulnerability affects versions 4.51.3 and earlier, and is fixed in version 4.52.1. The issue arises from the regex pattern<s_(.*?)>which can be exploited to cause excessive CPU consumption through crafted input strings due to catastrophic backtracking. This vulnerability can lead to service disruption, resource exhaustion, and potential API service vulnerabilities, impacting document processing tasks using the Donut model.Severity
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:LReferences
This data is provided by OSV and the GitHub Advisory Database (CC-BY 4.0).
Hugging Face Transformers library has Regular Expression Denial of Service
CVE-2025-6051 / GHSA-rcv9-qm8p-9p6j
More information
Details
A Regular Expression Denial of Service (ReDoS) vulnerability was discovered in the Hugging Face Transformers library, specifically within the
normalize_numbers()method of theEnglishNormalizerclass. This vulnerability affects versions up to 4.52.4 and is fixed in version 4.53.0. The issue arises from the method's handling of numeric strings, which can be exploited using crafted input strings containing long sequences of digits, leading to excessive CPU consumption. This vulnerability impacts text-to-speech and number normalization tasks, potentially causing service disruption, resource exhaustion, and API vulnerabilities.Severity
CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:LReferences
This data is provided by OSV and the GitHub Advisory Database (CC-BY 4.0).
Hugging Face Transformers is vulnerable to ReDoS through its MarianTokenizer
CVE-2025-6638 / GHSA-59p9-h35m-wg4g
More information
Details
A Regular Expression Denial of Service (ReDoS) vulnerability was discovered in the Hugging Face Transformers library, specifically affecting the MarianTokenizer's
remove_language_code()method. This vulnerability is present in version 4.52.4 and has been fixed in version 4.53.0. The issue arises from inefficient regex processing, which can be exploited by crafted input strings containing malformed language code patterns, leading to excessive CPU consumption and potential denial of service.Severity
CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:LReferences
This data is provided by OSV and the GitHub Advisory Database (CC-BY 4.0).
Hugging Face Transformers vulnerable to Regular Expression Denial of Service (ReDoS) in the AdamWeightDecay optimizer
CVE-2025-6921 / GHSA-4w7r-h757-3r74
More information
Details
The huggingface/transformers library, versions prior to 4.53.0, is vulnerable to Regular Expression Denial of Service (ReDoS) in the AdamWeightDecay optimizer. The vulnerability arises from the _do_use_weight_decay method, which processes user-controlled regular expressions in the include_in_weight_decay and exclude_from_weight_decay lists. Malicious regular expressions can cause catastrophic backtracking during the re.search call, leading to 100% CPU utilization and a denial of service. This issue can be exploited by attackers who can control the patterns in these lists, potentially causing the machine learning task to hang and rendering services unresponsive.
Severity
CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:LReferences
This data is provided by OSV and the GitHub Advisory Database (CC-BY 4.0).
Hugging Face Transformers Regular Expression Denial of Service (ReDoS) vulnerability
CVE-2025-5197 / GHSA-9356-575x-2w9m
More information
Details
A Regular Expression Denial of Service (ReDoS) vulnerability exists in the Hugging Face Transformers library, specifically in the
convert_tf_weight_name_to_pt_weight_name()function. This function, responsible for converting TensorFlow weight names to PyTorch format, uses a regex pattern/[^/]*___([^/]*)/that can be exploited to cause excessive CPU consumption through crafted input strings due to catastrophic backtracking. The vulnerability affects versions up to 4.51.3 and is fixed in version 4.53.0. This issue can lead to service disruption, resource exhaustion, and potential API service vulnerabilities, impacting model conversion processes between TensorFlow and PyTorch formats.Severity
CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:LReferences
This data is provided by OSV and the GitHub Advisory Database (CC-BY 4.0).
GitHub Vulnerability Alerts
CVE-2024-11392
Hugging Face Transformers MobileViTV2 Deserialization of Untrusted Data Remote Code Execution Vulnerability. This vulnerability allows remote attackers to execute arbitrary code on affected installations of Hugging Face Transformers. User interaction is required to exploit this vulnerability in that the target must visit a malicious page or open a malicious file.
The specific flaw exists within the handling of configuration files. The issue results from the lack of proper validation of user-supplied data, which can result in deserialization of untrusted data. An attacker can leverage this vulnerability to execute code in the context of the current user. Was ZDI-CAN-24322.
CVE-2024-11394
Hugging Face Transformers Trax Model Deserialization of Untrusted Data Remote Code Execution Vulnerability. This vulnerability allows remote attackers to execute arbitrary code on affected installations of Hugging Face Transformers. User interaction is required to exploit this vulnerability in that the target must visit a malicious page or open a malicious file.
The specific flaw exists within the handling of model files. The issue results from the lack of proper validation of user-supplied data, which can result in deserialization of untrusted data. An attacker can leverage this vulnerability to execute code in the context of the current user. Was ZDI-CAN-25012.
CVE-2024-11393
Hugging Face Transformers MaskFormer Model Deserialization of Untrusted Data Remote Code Execution Vulnerability. This vulnerability allows remote attackers to execute arbitrary code on affected installations of Hugging Face Transformers. User interaction is required to exploit this vulnerability in that the target must visit a malicious page or open a malicious file.
The specific flaw exists within the parsing of model files. The issue results from the lack of proper validation of user-supplied data, which can result in deserialization of untrusted data. An attacker can leverage this vulnerability to execute code in the context of the current user. Was ZDI-CAN-25191.
CVE-2024-12720
A Regular Expression Denial of Service (ReDoS) vulnerability was identified in the huggingface/transformers library, specifically in the file tokenization_nougat_fast.py. The vulnerability occurs in the post_process_single() function, where a regular expression processes specially crafted input. The issue stems from the regex exhibiting exponential time complexity under certain conditions, leading to excessive backtracking. This can result in significantly high CPU usage and potential application downtime, effectively creating a Denial of Service (DoS) scenario. The affected version is v4.46.3.
CVE-2025-1194
A Regular Expression Denial of Service (ReDoS) vulnerability was identified in the huggingface/transformers library, specifically in the file
tokenization_gpt_neox_japanese.pyof the GPT-NeoX-Japanese model. The vulnerability occurs in the SubWordJapaneseTokenizer class, where regular expressions process specially crafted inputs. The issue stems from a regex exhibiting exponential complexity under certain conditions, leading to excessive backtracking. This can result in high CPU usage and potential application downtime, effectively creating a Denial of Service (DoS) scenario. The affected version is v4.48.1 (latest).Release Notes
huggingface/transformers (transformers)
v4.53.0Compare Source
Release v4.53.0
Gemma3n
Gemma 3n models are designed for efficient execution on low-resource devices. They are capable of multimodal input, handling text, image, video, and audio input, and generating text outputs, with open weights for pre-trained and instruction-tuned variants. These models were trained with data in over 140 spoken languages.
Gemma 3n models use selective parameter activation technology to reduce resource requirements. This technique allows the models to operate at an effective size of 2B and 4B parameters, which is lower than the total number of parameters they contain. For more information on Gemma 3n's efficient parameter management technology, see the Gemma 3n page.
Dia
Dia is an opensource text-to-speech (TTS) model (1.6B parameters) developed by Nari Labs.
It can generate highly realistic dialogue from transcript including nonverbal communications such as laughter and coughing.
Furthermore, emotion and tone control is also possible via audio conditioning (voice cloning).
Model Architecture:
Dia is an encoder-decoder transformer based on the original transformer architecture. However, some more modern features such as
rotational positional embeddings (RoPE) are also included. For its text portion (encoder), a byte tokenizer is utilized while
for the audio portion (decoder), a pretrained codec model DAC is used - DAC encodes speech into discrete codebook
tokens and decodes them back into audio.
Kyutai Speech-to-Text
Kyutai STT is a speech-to-text model architecture based on the Mimi codec, which encodes audio into discrete tokens in a streaming fashion, and a Moshi-like autoregressive decoder. Kyutai’s lab has released two model checkpoints:
Read more about the model in the documentation
V-JEPA 2
V-JEPA 2 is a self-supervised approach to training video encoders developed by FAIR, Meta. Using internet-scale video data, V-JEPA 2 attains state-of-the-art performance on motion understanding and human action anticipation tasks. V-JEPA 2-AC is a latent action-conditioned world model post-trained from V-JEPA 2 (using a small amount of robot trajectory interaction data) that solves robot manipulation tasks without environment-specific data collection or task-specific training or calibration.
Read more about the model in the documentation.
Arcee
Arcee is a decoder-only transformer model based on the Llama architecture with a key modification: it uses ReLU² (ReLU-squared) activation in the MLP blocks instead of SiLU, following recent research showing improved training efficiency with squared activations. This architecture is designed for efficient training and inference while maintaining the proven stability of the Llama design.
The Arcee model is architecturally similar to Llama but uses x * relu(x) in MLP layers for improved gradient flow and is optimized for efficiency in both training and inference scenarios.
Read more about the model in the documentation.
ColQwen2
ColQwen2 is a variant of the ColPali model designed to retrieve documents by analyzing their visual features. Unlike traditional systems that rely heavily on text extraction and OCR, ColQwen2 treats each page as an image. It uses the Qwen2-VL backbone to capture not only text, but also the layout, tables, charts, and other visual elements to create detailed multi-vector embeddings that can be used for retrieval by computing pairwise late interaction similarity scores. This offers a more comprehensive understanding of documents and enables more efficient and accurate retrieval.
Read more about the model in the documentation.
MiniMax
MiniMax is a powerful language model with 456 billion total parameters, of which 45.9 billion are activated per token. To better unlock the long context capabilities of the model, MiniMax adopts a hybrid architecture that combines Lightning Attention, Softmax Attention and Mixture-of-Experts (MoE). Leveraging advanced parallel strategies and innovative compute-communication overlap methods—such as Linear Attention Sequence Parallelism Plus (LASP+), varlen ring attention, Expert Tensor Parallel (ETP), etc., MiniMax's training context length is extended to 1 million tokens, and it can handle a context of up to 4 million tokens during the inference. On various academic benchmarks, MiniMax also demonstrates the performance of a top-tier model.
The architecture of MiniMax is briefly described as follows:
For more details refer to the release blog post.
Read more about the model in the documentation.
Encoder-Decoder Gemma
T5Gemma (aka encoder-decoder Gemma) was proposed in a research paper by Google. It is a family of encoder-decoder large langauge models, developed by adapting pretrained decoder-only models into encoder-decoder. T5Gemma includes pretrained and instruction-tuned variants. The architecture is based on transformer encoder-decoder design following T5, with improvements from Gemma 2: GQA, RoPE, GeGLU activation, RMSNorm, and interleaved local/global attention.
T5Gemma has two groups of model sizes: 1) Gemma 2 sizes (2B-2B, 9B-2B, and 9B-9B), which are based on the offical Gemma 2 models (2B and 9B); and 2) T5 sizes (Small, Base, Large, and XL), where are pretrained under the Gemma 2 framework following T5 configuration. In addition, we also provide a model at ML size (medium large, ~2B in total), which is in-between T5 Large and T5 XL.
The pretrained varaints are trained with two objectives: prefix language modeling with knowledge distillation (PrefixLM) and UL2, separately. We release both variants for each model size. The instruction-turned varaints was post-trained with supervised fine-tuning and reinforcement learning.
Read more about the model in the documentation.
GLM-4.1V
The GLM-4.1V model architecture is added to transformers; no models have yet been released with that architecture. Stay tuned for the GLM team upcoming releases!
Read more about the model in the documentation.
Falcon H1
The FalconH1 model was developed by the TII Pretraining team. A comprehensive research paper covering the architecture, pretraining dynamics, experimental results, and conclusions is forthcoming. You can read more about this series in this website.
Read more about the model in the documentation.
LightGlue
The LightGlue model was proposed in LightGlue: Local Feature Matching at Light Speed
by Philipp Lindenberger, Paul-Edouard Sarlin and Marc Pollefeys.
Similar to SuperGlue, this model consists of matching
two sets of local features extracted from two images, its goal is to be faster than SuperGlue. Paired with the
SuperPoint model, it can be used to match two images and
estimate the pose between them. This model is useful for tasks such as image matching, homography estimation, etc.
The abstract from the paper is the following:
We introduce LightGlue, a deep neural network that learns to match local features across images. We revisit multiple
design decisions of SuperGlue, the state of the art in sparse matching, and derive simple but effective improvements.
Cumulatively, they make LightGlue more efficient - in terms of both memory and computation, more accurate, and much
easier to train. One key property is that LightGlue is adaptive to the difficulty of the problem: the inference is much
faster on image pairs that are intuitively easy to match, for example because of a larger visual overlap or limited
appearance change. This opens up exciting prospects for deploying deep matchers in latency-sensitive applications like
3D reconstruction. The code and trained models are publicly available at this https URL
Read more about the model in the documentation.
dots.llm1
The abstract from the report is the following:
Mixture of Experts (MoE) models have emerged as a promising paradigm for scaling language models efficiently by activating only a subset of parameters for each input token. In this report, we present dots.llm1, a large-scale MoE model that activates 14B parameters out of a total of 142B parameters, delivering performance on par with state-of-the-art models while reducing training and inference costs. Leveraging our meticulously crafted and efficient data processing pipeline, dots.llm1 achieves performance comparable to Qwen2.5-72B after pretraining on high-quality corpus and post-training to fully unlock its capabilities. Notably, no synthetic data is used during pretraining. To foster further research, we open-source intermediate training checkpoints spanning the entire training process, providing valuable insights into the learning dynamics of large language models.
Read more about the model in the documentation.
SmolLM3
SmolLM3 is a fully open, compact language model designed for efficient deployment while maintaining strong performance. It uses a Transformer decoder architecture with Grouped Query Attention (GQA) to reduce the kv cache, and no RoPE, enabling improved performance on long-context tasks. It is trained using a multi-stage training approach on high-quality public datasets across web, code, and math domains. The model is multilingual and supports very large context lengths. The instruct variant is optimized for reasoning and tool use.
Read more about the model in the documentation.
Performance optimizations
Kernels
In previous versions, installing the
kernelslibrary would automatically activate the custom kernels added totransformers, because the@use_kernel_forward_from_the_hubdecorator directly swapped out the model’s forward method. This implicit behavior caused several issues for users — including problems withtorch.compile, non-determinism, and inconsistent outputs.To address this, we've introduced a new opt-in mechanism called
kernelize. You can now enable kernel usage explicitly by passinguse_kernels=Truetofrom_pretrained. Theuse_kernel_forward_from_the_hubdecorator now simply stores the kernel name that the user wants to use — andkernelizehandles the rest under the hood.Example